Goto

Collaborating Authors

 paper source tracing


LLM-Powered Ensemble Learning for Paper Source Tracing: A GPU-Free Approach

arXiv.org Artificial Intelligence

We participated in the KDD CUP 2024 paper source tracing competition and achieved the 3rd place. This competition tasked participants with identifying the reference sources (i.e., ref-sources, as referred to by the organizers of the competition) of given academic papers. Unlike most teams that addressed this challenge by fine-tuning pre-trained neural language models such as BERT or ChatGLM, our primary approach utilized closed-source large language models (LLMs). With recent advancements in LLM technology, closed-source LLMs have demonstrated the capability to tackle complex reasoning tasks in zero-shot or few-shot scenarios. Consequently, in the absence of GPUs, we employed closed-source LLMs to directly generate predicted reference sources from the provided papers. We further refined these predictions through ensemble learning. Notably, our method was the only one among the award-winning approaches that did not require the use of GPUs for model training. Code available at https://github.com/Cklwanfifa/KDDCUP2024-PST.


Text-Driven Neural Collaborative Filtering Model for Paper Source Tracing

arXiv.org Artificial Intelligence

Identifying significant references within the complex interrelations of a citation knowledge graph is challenging, which encompasses connections through citations, authorship, keywords, and other relational attributes. The Paper Source Tracing (PST) task seeks to automate the identification of pivotal references for given scholarly articles utilizing advanced data mining techniques. In the KDD CUP OAG-Challenge PST track, we design a recommendation-based framework tailored for the PST task. This framework employs the Neural Collaborative Filtering (NCF) model to generate final predictions. To process the textual attributes of the papers and extract input features for the model, we utilize SciBERT, a pre-trained language model. According to the experimental results, our method achieved a score of 0.37814 on the Mean Average Precision (MAP) metric, outperforming baseline models and ranking 11th among all participating teams. The source code is publicly available at https://github.com/MyLove-XAB/KDDCupFinal.